<<<<<<< HEAD ======= >>>>>>> helen Spotify Music Exploratory Data Analysis

This assignment is for ETC5521 Assignment 1 by Team wallaby comprising of Helen Evangelina and Rahul Bharadwaj.

Introduction and Motivation

Music, in a broad sense, is any art composed of sound, but it can express people’s thoughts and thoughts, which implies the author’s life experience, thoughts and feelings, and can bring people the enjoyment of beauty and the expression of human feelings. At the same time, music is also a form of social behavior, through which people can exchange feelings and life experiences.

In ancient times, when the court held a banquet, or some talented people visited the landscape, they would play music to boost the fun. But in modern times, because the threshold of classical music is too high, and its development has gradually reached the extreme, it has become a very small group, while pop music (the general name of popular songs, including Rock, R&B, Latin, etc) is gradually showing its own characteristics. Therefore, modern songs are quietly occupying the top position in people’s hearts because of their outstanding performance in conveying emotion and life experience. Listening to pop music has also become the most common behavior in everyone’s daily entertainment.

Spotify is a legitimate streaming music service platform, which has been supported by Warner Music, Sony, EMI and other major record companies around the world. Now it has more than 60 million users, and it is the world’s leading large-scale online streaming music playing platform.

Because Spotify contains a large number of users’ data, four users who are very interested in it, Charlie Thompson, Josiah parry, Donal Phipps, and Tom Wolff decided to make it easier for everyone to know their own preferences or the mainstream of most people’s listening to songs through spotify’s API, thus creating Spotifyr package. In addition to Spotify package, our data is also mixed with blog post data created by Kaylin Pavlik. Six main categories (EDM, Latin, pop, R&B, rap, rock) are used to classify 5000 songs. The combination of the two data has a great effect on the study of the popularity of pop music.

Nowadays, music plays an important role in people’s life. It plays an indispensable role in helping people manage and improve their quality of life. As fans of music, we not only enjoy music, but also wonder how music strikes people’s hearts with simple tones, rhythms, timbres and words. How popular is each genre? How much influence does the genre, or the various attributes of songs, have on music popularity? Does it makes us dance or sing unconsciously, or does it convey our emotions and implicate our thoughts? The curiosity behind all these questions drives the purpose of this analysis.

Analysis Questions

By doing this exploratory data analysis, we want to know:

Primary Question: What audio features are capable of making an impact on the popularity of music artworks and contribute to the emergence of Top Songs?

Sub Questions:

  1. Since 1957, what are the audio features of those top artists who make the most music artworks?

  2. Explore our favorite artist - Coldplay’s works, e.g. how about the musical positiveness conveyed by their albums?

  3. There are plenty of modern music genres nowadays, What unique style or charm can stand out and become the first choice of people?

<<<<<<< HEAD

Questions Added to enhance the scope of the analysis:

  1. Explore the music characteristics over time, is the music characteristic changing?

  2. What exactly makes artists stand out even when there are artists doing the same kind of music? What is the Unique Selling Point (USP) of a few particular selected artists?

This helps us enhance the scope of the primary analysis and broadens our understanding of the relations between popularity and audio features.

Data Description

Data Source

Data Collection Methods:

  • Spotifyr package can extract track audio characteristics or other related information from Spotify’s Web API in batches. For example, if you want to search for an artist, just type in his name, and all his albums or songs will be listed in seconds.

  • Meanwhile, Spotifyr package will record the popularity metrics of all tracks or albums, so it is easy to understand the correlation between music popularity and music characteristics. Then, Jon Harmon and Neal Grantham extracted the Spotifr package and added the content of Kaylin Pavlik’s recent blogpost to divide the genre of nearly 5000 songs, thus generating the Tidytuesdayr package we need for this assignment.

  • We chose music works created by artists that can be found on Spotify from January 1, 1957 to January 29, 2020.

Data Structure

  • After reading the data on RStudio, our team used the glimpse() function to show the specific content and structure of the data. And here is a brief summary of the data structure:
## Rows: 32,833
## Columns: 24
## $ X                        <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
## $ track_id                 <chr> "6f807x0ima9a1j3VPbc7VN", "0r7CVbZTWZgbTCY...
## $ track_name               <chr> "I Don't Care (with Justin Bieber) - Loud ...
## $ track_artist             <chr> "Ed Sheeran", "Maroon 5", "Zara Larsson", ...
## $ track_popularity         <int> 66, 67, 70, 60, 69, 67, 62, 69, 68, 67, 58...
## $ track_album_id           <chr> "2oCs0DGTsRO98Gh5ZSl2Cx", "63rPSO264uRjW1X...
## $ track_album_name         <chr> "I Don't Care (with Justin Bieber) [Loud L...
## $ track_album_release_date <chr> "2019-06-14", "2019-12-13", "2019-07-05", ...
## $ playlist_name            <chr> "Pop Remix", "Pop Remix", "Pop Remix", "Po...
## $ playlist_id              <chr> "37i9dQZF1DXcZDD7cfEKhW", "37i9dQZF1DXcZDD...
## $ playlist_genre           <chr> "pop", "pop", "pop", "pop", "pop", "pop", ...
## $ playlist_subgenre        <chr> "dance pop", "dance pop", "dance pop", "da...
## $ danceability             <dbl> 0.748, 0.726, 0.675, 0.718, 0.650, 0.675, ...
## $ energy                   <dbl> 0.916, 0.815, 0.931, 0.930, 0.833, 0.919, ...
## $ key                      <int> 6, 11, 1, 7, 1, 8, 5, 4, 8, 2, 6, 8, 1, 5,...
## $ loudness                 <dbl> -2.634, -4.969, -3.432, -3.778, -4.672, -5...
## $ mode                     <int> 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, ...
## $ speechiness              <dbl> 0.0583, 0.0373, 0.0742, 0.1020, 0.0359, 0....
## $ acousticness             <dbl> 0.10200, 0.07240, 0.07940, 0.02870, 0.0803...
## $ instrumentalness         <dbl> 0.00e+00, 4.21e-03, 2.33e-05, 9.43e-06, 0....
## $ liveness                 <dbl> 0.0653, 0.3570, 0.1100, 0.2040, 0.0833, 0....
## $ valence                  <dbl> 0.518, 0.693, 0.613, 0.277, 0.725, 0.585, ...
## $ tempo                    <dbl> 122.036, 99.972, 124.008, 121.956, 123.976...
## $ duration_ms              <dbl> 194754, 162600, 176616, 169093, 189052, 16...
  • The spotify_song is tabular data, which contains 24 columns and 32,833 rows. The variables, their types and the description of each variable are presented in the table below.

Data Table -

A Visual Overview of the Data:

Visual Representation of the Dataset

Visual Representation of the Dataset

  • A picture speaks a thousand words. Thus, we represent the data in a simple and elegant visualization that describes the same column names and types described previously through text.

  • Since our analysis focuses on correlations between audio features, it is a good idea to have some overview as to how the numerical fields correlate.

A Visual Representation of the Correlation of numeric data

A Visual Representation of the Correlation of numeric data

  • The Visualization above shows how each numerical variable correlate among themselves. This gives us a basic understanding of how we can analyze for correlations.

Data Cleaning:

  • Now, we will clean the data, select the variables that are useful to our EDA, and retain six major music genres (the proportions of other genres are very low, which can be ignored). And then, we arrange the data from high to low according to track popularity.
Clean Data with necessary columns

Clean Data with necessary columns

  • The above figure gives an overview of the columns necessary for our analysis. This data is clean with less than 0.1% missing data and is ready for analysis.
=======

Data description

Data Source

The data of this report is part of the tidytuesday chanllenge, which comes from Spotify via the spotifyr package.

The variables in this dataset are X, track_id, track_name, track_artist, track_popularity, track_album_id, track_album_name, track_album_release_date, playlist_name, playlist_id, playlist_genre, playlist_subgenre, danceability, energy, key, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence, tempo, duration_ms, time frame of collection is from 1957-01-01 to 2020-01-29.

Data collection methods: Spotifyr package can extract track audio characteristics or other related information from Spotify’s Web API in batches. For example, if you want to search for an artist, just type in his name, and all his albums or songs will be listed in seconds. Meanwhile, Spotifyr package will record the popularity metrics of all tracks or albums, so it is easy to understand the correlation between music popularity and music characteristics. Then, Jon Harmon and Neal Grantham extracted the Spotifr package and added the content of Kaylin Pavlik’s recent blogpost to divide the genre of nearly 5000 songs, thus generating the Tidytuesdayr package we need for this assignment.

We chose music works created by artists that can be found on Spotify from January 1, 1957 to January 29, 2020.

Data structure

library(DT)

datatable(variables) variables <- tibble(“Variable” = c(“track_id”, “track_name”, “track_artist”, “track_popularity”, “track_album_id”, “track_album_name”, “track_album_release_date”, ), “Description” = c(“a”, “b”))

" variable class
track_id character Song unique ID
track_name character Song Name
track_artist character Song Artist
track_popularity double Song Popularity (0-100) where higher is better
track_album_id character Album unique ID
track_album_name character Song album name
track_album_release_date character Date when album released
playlist_name character Name of playlist
playlist_id character Playlist ID
playlist_genre character Playlist genre
playlist_subgenre character Playlist subgenre
danceability double Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
energy double Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.
key double The estimated overall key of the track. Integers map to pitches using standard Pitch Class notation . E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1.
loudness double The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db.
mode double Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.
speechiness double Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.
acousticness double A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.
instrumentalness double Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.
liveness double Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.
valence double A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
tempo double The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.
duration_ms double Duration of song in milliseconds

variables %>% datatable(filter = ‘top’, rownames = FALSE, options = list(pageLength = 5))

>>>>>>> helen

Analysis and Findings

<<<<<<< HEAD

Top Artists

  • Now, we will clean the data, select the variables that are useful to our EDA, and retain six major music genres (the proportions of other genres are very low, which can be ignored). And then, we arrange the data from high to low according to track popularity.
  • From the above table, we can see that Trevor Daniel, Y2K and Don Toliver occupy the first, second and third places respectively. Also, we can see that there are many famous artists on the list, such as Drake, Maroon 5 or Ed Sheeran, etc.
=======

Top artists

Now, we will clean the data, select the variables that are useful to our EDA, and retain six major music genres (the proportions of other genres are very low, which can be ignored). And then, we arrange the data from high to low according to track popularity.

From the following table and figure, we can see that Queen, Martin Garrix and the Chainmakers occupy one, two and three places respectively. Also, we can see that there are many famous artists on the list, such as Drake, Maroon 5 or Ed Sheeran, etc.

Similarly, this is a plot of artists with most songs showed in the bar plot. Our group decided to use two different forms to express, one is through the comparison of words(using datatable), the other is through the observation of intuitive figure. This will help to deepen our impression of the top 20 singers and have an intuitive understanding of the gap between them.

>>>>>>> helen
Top 20 Artists who wrote the most songs from 1941 to 2020

Top 20 Artists who wrote the most songs from 1941 to 2020

  • The figure above shows the top 10 artists with the most songs. Having a bar plot instead of table will help to deepen our impression of the top 10 singers with most songs and have an intuitive understanding of the gap between them. Like mentioned previously, pictures speak a lot more than tables and information in text format.

  • We filter artists whose popularity is greater than 95, and then visualize it in the form of a radar plot. This way, the singers who are at the top can be clearly identified at a glance. At the same time, music lovers can know the characteristics of these top singers’ music artworks.

Characteristics of Top Singers

Characteristics of Top Singers

  • The height of each pie segment shows the level of popularity. The color intensity shows the energy levels of the songs by that artist and different colors represent genre. The blue outline describes the danceability of the songs. This way, we can perceive three audio features at the same time along with Track Popularity.

  • From the figure above, we can see that Maroon 5, the Weekend, Roddy Rich and KAROL G are overwhelming in popularity. Also, it is clear that popular singers usually create many genres of songs, which are not limited to a single genre.

  • Next, from the perspective of different artists’ music artworks style, they are filled with the great differences. For example, from the brightness of colors, we can see that the Energy brought by Maroon 5 and Billie Eilish’s music artworks is not too high. This is not to elaborate their shortcomings, but to elaborate their style, which is lyrical and soft. If judging from the color of each fan-shaped boundary line, it can be concluded that Roddy Rich and Trevor Daniel’s works have the highest value of danceability, after the comparison of each artworks’ average tempo, rhythm stability, beat strength, and overall regularity.

Analyzing our Favorite Artist - Coldplay

  • In this part, we want to take one artist for example to do some detailed exploratory analysis using the “spotifyr” package. Here we choose the Coldplay, our favorite artist.

  • First, we loaded all the albums of Coldplay available on spotify and dropped the duplicate ones (some live tour albums are duplicate with the existed ones). We calculated the average valence of each album. The results are shown in the following table.

The Musical Positiveness of Coldplay’s Albums
album_name valence
Everyday Life 0.30
Viva La Vida or Death and All His Friends 0.26
Mylo Xyloto 0.25
Parachutes 0.23
A Head Full of Dreams 0.23
X&Y 0.22
Ghost Stories 0.21
Love in Tokyo 0.19
A Rush of Blood to the Head 0.18
  • According to the spotify tracks documentation, The valence variable is measured from 0.0 to 1.0, describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). The highest valence of these albums is 0.3, and the lowest valence is 0.18, which means the songs of Coldplay usually sounds more negative than positive for the audience.

  • Second, we make a density plot to show the ranges and densities of valence of each album.

Valence Density of Coldplay Albums

Valence Density of Coldplay Albums

  • From the above figure, we can find that “Everyday Life” has the widest range of valence, that is to say, this album contains abundant emotions. Meanwhile, “A Rush of Blood to the Head” has a narrow range of valence, and the valence density centered at the area with lower valence values. It’s probably that the audience would feel negative emotions like sad, depressed and angry when they listening to this album. This finding surprised us because “A Rush of Blood to the Head” is the second best album in “The Coldplay Albums Ranked”. So we decided to look more in depth next.
The most frequent words in ‘A Rush of Blood to the Head’
word sentiment n
love positive 7
easy positive 4
fall negative 4
grace positive 4
miss negative 4
  • Lastly, we analyzed the sentiment of this album to see whether the valence of an album is associated with the lyrics. The average sentiment value of this album is -0.47 by the “afinn” lexicon. And we also analyzed the sentiment of lyrics using the “bing” lexicon. The above table shows the most frequent words and their sentiment in this album. In addition, the figure below shows more intuitively the frequency of words which appears more than once. We can easily find that the negative words appear more than the positive ones.

  • As a result, we can say for sure that, both in terms of sound and lyrics, this album conveyed negative emotions. But this doesn’t affect that people think “A Rush of Blood to the Head” is one of the best albums of Coldplay. It can be seen that the audience’s love for a album is not entirely determined by the album’s positiveness but rather how well they can relate and resonate to emotions in the songs. This is proof that people use music for all emotional experiences and not just to have fun or feel refreshed!

Analyzing the Audio Features

  • In this part, we analyzed the audio features of all the songs in our dataset. Here’s a simple explanation of these features:
    • acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic.

    • danceablity: Danceability describes how suitable a track is for dancing. A value of 0.0 is least danceable and 1.0 is most danceable.

    • duration_ms: The duration of the track in milliseconds. (And duration_s in seconds, rounded.)

    • energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.

    • instrumentalness: Predicts whether a track contains no vocals.

    • key: The key the track is in.

    • liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live.

    • loudness: The overall loudness of a track in decibels (dB).

    • mode: Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.

    • speechiness: Speechiness detects the presence of spoken words in a track.

    • tempo: The overall estimated tempo of a track in beats per minute (BPM).

    • valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

  • The figure below shows how these features are like in different genres.
Audio Feature Density Plot

Audio Feature Density Plot

  • The next three box plots are to find out the differences of music attributes between different Music Genres.

  • Firstly, the relationship between color and Music Genre is established, and put into the same tibble, call “COLORS”. This method allows different Music Genre to be clearly distinguished by different colors, and then the specific characteristic of each Music Genre can be judged from those box plots.

<<<<<<< HEAD =======

The next three box plots are to find out the differences of music attributes between different Music Genres. Firstly, the relationship between color and Music Genre is established, and put into the same tibble, call “COLORS”. This method allows different Music Genre can be clearly distinguished by different colors, and then the specific characteristic of each Music Genre can be judged from those box plots.

The first plot is the relationship between Music Genre and Valence. It can be clearly seen from the plot that Latin has the highest value of Valence and EDM has the lowest value of Valence. This shows that Latin’s capacity of conveying the musical position is more powerful, while EDM sounds more negative. The other four Music Genre have no obvious trend in this respect, which are almost between 0.3 and 0.7.

>>>>>>> helen
Average valence by Music Genre

Average valence by Music Genre

  • The first plot above is the relationship between Music Genre and Valence. It can be clearly seen from the plot that Latin has the highest value of Valence and EDM has the lowest value of Valence. This shows that Latin’s capacity of conveying the musical positiveness is more powerful, while EDM sounds more negative. The other four Music Genre have no obvious trend in this respect, which are almost between 0.3 and 0.7.
Average Energy by Music Genre

Average Energy by Music Genre

  • The second plot above describes the relationship between Music Genre and Energy. Energy is a measure from 0.0 to 1.0 and represents a conceptual measure of intensity and activity. It can be clearly seen from the plot that EDM has the highest value of Energy, while Rythm and Bass value of Energy is the lowest, which also shows the style of these two Music Genres. Mostly, EDM will make people feel energized, loud, and noisy when listening. However, R&B is mainly lyrical, slow and quiet, which bring less energy for the listeners. Similarly, Rock has always been famous for its flexible and bold expression and passionate music rhythm and its ranking is only inferior to EDM.

  • Finally, the above plot describes the relationship between Music Genres and Speechiness. Speechiness detects the presence of spoken words in a track. If more words or sentences are said in a song, the closer to 1.0 the attribute value. That attribute is very interesting, which indicates whether the artists tends to express ideas by describing the lyrics in music or writing the melody of music to express their feelings.

Average Speechiness by Music Genre

Average Speechiness by Music Genre

  • From the plot, there is no doubt that Rap is bound to occupy the first place, because the characteristic of Rap is to quickly tell a series of rhyming lyrics against the background of mechanical rhythmic sound. What is worth noting is that Rock and POP are the lowest, which shows that those two genres tend to use the melody or rhythm of music to affect the audience, rather than using the lyrics.

  • After describing the contents and internal relations of the three plots in detail, there are still many related attributes that have not been explored. The purpose of our group is to put up the most interesting parts together. If someone is interested, it is easy to continue and build upon the existing analyses.

Music Genre and their Popularity - by Decade of release date

  • After reviewing the internal relations between Audio Features and Music Genres, now we can discuss about the Music Genres in detail. The table below shows the distribution of each genre in this dataset. The most frequently appeared genre is “edm”, while the genre “rock” appeared least.
Genres in the dataset
playlist_genre n
edm 6043
rap 5746
pop 5507
r&b 5431
latin 5155
rock 4951

The following figure shows the average popularity of songs released in different time. To show the result clearly and for convenience of comparison, we divided the result for each genre.

Genre Popularity by Decade

Genre Popularity by Decade

  1. EDM music emerged in the 1970s, and its popularity is 40 or even less. This shows EDM music is not the mainstream music nowadays and is restricted to a smaller group.

  2. Latin and pop music have been popular since the 1960s. The 1970s was the golden time for latin songs, while the 1960s and 1970s were the golden time for pop music. These old songs are popular even today!

  3. R&B music went through ups and downs. The songs released from the 1980s to the 2000s are less popular than others.

  4. Rap music has been popular since the 1960s, and the oldest rap music is still the most popular ones. The songs released in the 2000s have the lowest popularity now.

  5. The popularity of rock music released in different time period are quite stable. While the ones released from the 1960s to the 1990s are more popular than the others.

Correlation between Popularity and Audio Features

Internal Relations between Audio Features

The correlation of song features is very helpful for us to explore the reasons for the popularity of music artworks. We can see from the correlation plot that the characteristics of each song are specific and unique, but we can summarize them with ten musical attributes. Meanwhile, there are three types of relation between different attributes: Negative correlation, positive correlation or completely irrelevant. This is very important for us to analyze the properties of music artworks in the future.

For example, if a song has a strong energy attribute, it must also have a high value of loudness, and the probability of not belonging to acoustic is also very high. If a person likes songs that are more active or have higher valence, he should explore some potential favorite songs of high danceability, high energy, and contains more vocal content. It is easy to see that the role of correlation plot is very meaningful. It can play an irreplaceable role in the analysis of songs or the selection of the favorite attributes of songs and the rest of effects can be explored later.

  • We can build upon the correlation plot displayed in Data Description. The plot below shows a numerical display of correlation just like the shades that was produced in the previous plot.
Correlation between Audio Features

Correlation between Audio Features

<<<<<<< HEAD

Relationship between Popularity and a certain Audio Feature

After describing the unique information about audio features, now we pay attention to exploring whether these audio features contribute to a higher popularity. First we plot each audio feature of the songs and the popularity in the following figure.

Popularity vs Audio Feature

Popularity vs Audio Feature

  • It shows that liveness has a negative relationship with popularity and we also find that there’s no absolute relationship between valence and popularity. A higher valence doesn’t necessarily make a song more popular.This is consistent with our sentiment analysis.

  • Also, We are not sure whether those above dot plots can directly reveal the relationship between these popularity and audio features. So we pay attention to exploring whether these audio features contribute to a higher popularity using a linear regression model just in case.

  • Here we filtered the songs with a popularity greater than 0, since 0 popularity value does not make sense in this model. And the following table shows all the audio features with a p-value less than 0.05. We can draw a conclusion that danceability and valence contribute most to a higher popularity.

  • Acousticness, key, loudness, mode and tempo also have positive relationship with popularity. While energy, instrumentalness, liveness and speechiness have negative relationship with popularity, with is similar with those dot plots conclusion.

=======

Relationship between popularity and a certain audio feature

After describing the unique information about audio features, now we pay attention to exploring whether these audio features contribute to a higher popularity. First we plotted each audio feature of the songs and the popularity in the following figure to observe. It shows that liveness has a negative relationship with popularity and we also find that there’s no absolute relationship between valence and popularity. A higher valence doesn’t necessarily make a song more popular.This is consistent with our sentiment analysis.

## # A tibble: 43 x 3
## # Groups:   genre [6]
##    genre decade mean_popularity
##    <chr> <chr>            <dbl>
##  1 edm   1970              24  
##  2 edm   1980              37.4
##  3 edm   1990              39.3
##  4 edm   2000              20.7
##  5 edm   2010              35.1
##  6 edm   2020              40.3
##  7 latin 1960              26  
##  8 latin 1970              63.2
##  9 latin 1980              39.9
## 10 latin 1990              39.8
## # ... with 33 more rows

Also, We are not sure whether those above dot plots can directly reveal the relationship between these popularity and audio features. So we pay attention to exploring whether these audio features contribute to a higher popularity using a linear regression model just in case. Here we filtered the songs with a popularity greater than 0, since 0 popularity value does not make sense in this model. And the following table shows all the audio features with a p-value less than 0.05. We can draw a conclusion that danceability and valence contribute most to a higher popularity. Acousticness, key, loudness, mode and tempo also have positive relationship with popularity. While energy, instrumentalness, liveness and speechiness have negative relationship with popularity, with is similar with those dot plots conclusion.

>>>>>>> helen
lm (popularity ~ features)
term estimate std.error statistic p.value
(Intercept) 744.18 15.80 47.10 0.00
acousticness 18.36 6.85 2.68 0.01
danceability 35.67 9.91 3.60 0.00
duration_ms 0.00 0.00 -14.03 0.00
energy -253.44 11.32 -22.39 0.00
instrumentalness -109.25 6.07 -18.01 0.00
key 0.95 0.35 2.69 0.01
liveness -23.39 8.47 -2.76 0.01
loudness 14.09 0.61 23.07 0.00
mode 6.61 2.58 2.57 0.01
speechiness -43.00 12.83 -3.35 0.00
tempo 0.18 0.05 3.72 0.00
valence 37.22 6.08 6.12 0.00
  • Using geom_smooth, we can get a clear picture of how popularity is affected by different audio features. We can observe how each audio feature trends with increasing popularity,
Popularity vs Audio Feature using Smooth Curves

Popularity vs Audio Feature using Smooth Curves

  • We can observe that most of the audio features have almost no relation with popularity except for Energy and Instrumentalness which negatively affect popularity while Danceability positively affect popularity. This trend is observed for tracks that have a popularity greater than 50.

  • This leads us to extend the analysis to pursue danceabilty and check which music genre, and artists are in line with this trend. The next analysis pursues our questiton as to what the unique selling point for each selected artist is.

<<<<<<< HEAD

Unique Features of Artists

  • Now that we have analyzed about the correlation of different audio features, let’s explore how the artists are popular and exactly why they are popular. This involves analyzing common audio features in the songs of the top artists.

  • We choose the following artists who are regarded as one of the best in their genre:

    • Taylor Swift - Pop
    • Eminem - Rap
    • AC/DC - Rock
    • Shakira - Latin
    • Usher - R & B
    • David Guetta - EDM
  • We select Danceability, Speechiness , Energy and Valence as our audio features since these best describe the genres we have selected.

Taylor Swift (Pop Artist) Audio Features

  • First up, let’s see what Taylor’s pop songs are like.
Taylor Swift Audio Features

Taylor Swift Audio Features

=======

Music components overtime

Explore the music characteristics over time. How is it changing? And then explore top 5 artists (according to track_popularity) in terms of the music characteristics over time – is the music characteristic for one artist changing over time?

(put here or on the introduction?) how they expand on what has been done already, why these would be interesting to pursue, and how it broadens the scope of the original analysis.

It has been previously discussed about the different music components and the correlation between each, and also track_popularity. Now, another thing that we can look at is the trend of the music components overtime. Along time, more musical instruments and more genres are being introduced, changing our music taste. Therefore, analysing the trend of music components over time is an interesting thing to look at as it would be beneficial to understand how the characteristics of music are changing. As music evolves, we want to look at how the characteristics evolve. Are the music characteristics in 1957 similar to those in 2019? The trend of music genres has been analysed before, which shows that the trend is changing. Therefore, looking at the trend of music components would broaden the analysis more to see if the components are also changing.

The trend of music components over the years.

The trend of music components over the years.

To answer the research question “How is the music characteristic over time and how is it changing?”, we plot the values of the different music components against year, and faceted according to the components. We can see from Figure @ref(fig:components-trend) that there are some components which are changing over time.

Acousticness, mode, tempo and valence tend to decrease over time, while danceability and energy tend to increase. The decreasing valence indicates that new songs tend to be sadder as more and more unhappy songs are being released every year. “Happiness and brightness in music has declined, while sadness increased in the last 30 years or so” (https://entertainment.inquirer.net/274757/people-prefer-happy-music-sad-songs-trend-past-30-years-study). Researches have suggested that the usage of positive emotions has declined. However, there is a high variability in valence values which means that there are varying types of songs. The increasing danceability and energy is due to the rising of electronic music.

Other music components tend to follow a steady trend overtime. However, looking at instrumentalness, speechiness, liveness, and loudness, there has been more variances along time. Loudness tend to range between -20 to 0 decibels, except for some outliers present where the loudness is really low.

The emergence of more modern music with high energy and

The music characteristics might change because of the emergence of new types and more modern music. But is the music characteristic also changing for an artist? It would be interesting to look at the trend to see if one specific artist also has a shift in his music characteristics. Is the same artist making the same kind of music through time?

To answer this question, we are going to look at the top five artists and look at their music characteristics trend over time. We are not using the track_popularity to determine the top five artists here as some artists that are in the top five only have songs in a certain year, thus we are not able to compare the music characteristics over time. Therefore, we are using number of track_name instead. The following table display the top five aartists:

track_artist Total
Queen 111
Martin Garrix 73
David Guetta 64
Logic 62
Hardwell 61

Unlike the other four artists, Queen has a different music timeline. Therefore, for the purpose of having a clearer visualisation, we split Queen with the others.

The musical components of the top 5 artists (excluding Queen) over time.

The musical components of the top 5 artists (excluding Queen) over time.

The musical components of the top 5 artists (excluding Queen) over time.

It can be seen from Figure @ref(fig:top4-components) that music components of the individual artist shift over time. Some components are pretty volatile. David Guetta, who (berkarir) in both pop and edm, has a huge change in his danceability and energy. He tends to produce more music with lower danceability but higher energy these days. His songs also have lower duration compared to those in the past. The valence of his songs change overtime too - with 2010 and 2011 being the years when he produced more positive songs, and his songs tend to get less positive over the years. Hardwell focuses on edm, which explains the low speechiness. His instrumentalness used to be pretty high in 2013, however it is shifting towards zero over time. Unlike David Guetta, Hardwell tends to produce more positive songs these days.

Being a rapper, it makes sense that Logic has a very close to zero instrumentalness. The trend of key component in his music follows a decreasing trend over time. The valence, acousticness, and liveness of his songs are pretty volatile over time. One inetersting thing about Hardwell is that his danceability and tempo were decreasing in around 2016, indicating that he produced less upbeat and slower tempo music in this year. After 2016, his danceability and tempo are increasing again. Martin Garrix is an edm artist. He has a decreasing energy, loudness and duration over time. In 2013 and 2014, he produced music with a relatively high instrumentalness, however it dropped down across the years. Similar to Hardwell, Martin Garric creates more positive songs along the years - indicated by the increasing valence.

A clear thing to notice here is that all of them have a relatively low speechiness as they are all in either edm or rap genre. The most volatile component of all is mode.

The musical components of Queen over time.

The musical components of Queen over time.

Figure @ref(fig:queen-trend) is showing the music components of Queen’s songs over time. And here we can see that the music characteristics evolve overtime. Queen is a rock band, hence explains the relatively high energy level. A very interesting thing here is that the danceability, energy, liveness, loudness, and speechiness are all dropping in 1992, while the acousticness is increasing sharply compared to the years before. This is due to Queen only produced one song in 1992 - the popular “We Are The Champions”. This song is pretty different compared to other songs Queen produced as this song has less energy and less danceable.

It can be concluded that the music characteristics evolve over time. And despite the changes, each artist has their own uniqueness in terms of the music they produce.

i wanna look at the relo between track artist and their musical characteristics(x as artist, y as characteristics?)

can also look at relationship between year (decade) and characteristics and make scatter plot matrix coloured by artists.

a <- top5artists %>% mutate(decade = round(as.numeric(year) - 4.5, -1)) %>% pivot_longer(danceability:valence, names_to = “characteristics”, values_to = “values”) a\(year <- as.Date(a\)year, format = “%Y”)

aa <- characteristics_topartists %>% group_by(year, track_artist)

ggplot() + geom_line(data = aa, aes(x = year, y = mean, color = track_artist)) + geom_point(data = a, aes(x = year, y = values, color = track_artist)) + facet_wrap(~characteristics, scales = “free”)

the interesting one here is valence - so we look at it closer.

TO see if the characteristics are changing over time, can look at the correlation between the char and year.

lm(year~features)
term estimate std.error statistic p.value
(Intercept) 2011.04 0.08 24943.56 0
## 
## Call:
## lm(formula = year ~ acousticness, data = spotify_songs)
## 
## Coefficients:
##  (Intercept)  acousticness  
##    2011.0434        0.5351
## 
## Call:
## lm(formula = year ~ danceability, data = spotify_songs)
## 
## Coefficients:
##  (Intercept)  danceability  
##       2002.8          12.7

Conclusion

After Exploratory Data Analysis, our group got the answers to those questions. First of all, there is a positive or negative correlation between audio features and track popularity. However, as we all know, the value of a art work can’t be measured only by numbers. The popularity of music artworks depends more on the artist’s own popularity, creative talent or singing ability, or external factors such as world trends. The probability of success by deliberately catering to audio features and creating specific songs is not sufficient.

Secondly, each top artist has its own artistic characteristics, and will be loved by specific groups of people. Top artists do not create music artworks according to the trend, instead, they will create their own trend for the world.

As for the six kinds of music genres that can stand out from the modern music, there are also their own characteristics inside. It’s hard to understand the reasons for their success because of their unique styles. What we can do is to determine the genre of each song according to its style.

Finally, Although Coldplay, as one of the representative rock artist, their works contain more negative emotions. This is also in line with the rebellious and critical spirit of rock music, and this spirit has been respected by young people of different races all the time. They stick to their own style, try unconventional music routines as far as possible, and point to people’s hearts with straightforward, profound and moving melody. This also confirms our analysis that Coldplay songs’ lyrics convey negative emotions, which does not affect their popularity, but makes them top artists. In conclusion, track popularity will pay more attention to the singer’s own ability and attitude, rather than audio features. The biggest role of audio features is to reflect the singer’s music style, rather than increase popularity.

The R packages we used in this report: Wickham (2016), Waring et al. (2020), Wei and Simko (2017), Arnold (2019), Xie, Cheng, and Tan (2020), Wickham, Hester, and Chang (2020), Wickham, Hester, and Francois (2018), Wickham et al. (2019), Wickham et al. (2020), Grolemund and Wickham (2011), Xie (2020), Zhu (2019), Robinson, Hayes, and Couch (2020), Auguie (2017), Parry and Barr (2020), Silge and Robinson (2016), Hvitfeldt (2020), Thompson (2017), Wilke (2020) .

Reference

Arnold, Jeffrey B. 2019. Ggthemes: Extra Themes, Scales and Geoms for ’Ggplot2’. https://CRAN.R-project.org/package=ggthemes.

Auguie, Baptiste. 2017. GridExtra: Miscellaneous Functions for "Grid" Graphics. https://CRAN.R-project.org/package=gridExtra.

Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.

Hvitfeldt, Emil. 2020. Textdata: Download and Load Various Text Datasets. https://CRAN.R-project.org/package=textdata.

Parry, Josiah, and Nathan Barr. 2020. Genius: Easily Access Song Lyrics from Genius.com. https://CRAN.R-project.org/package=genius.

>>>>>>> helen
  • Danceability and valence are the strongest suits for Taylor Swift. Her songs are easy to dance to and are mostly positive. Keeping in mind that most of her songs are about her ex-lovers, it is surprising to see that the valence is positive. The choice of words used by Swift is mostly positive even if the song is about a topic that is generally negative. Thus, we can say Taylor Swift’s songs are danceable and she resonates most with an audience that love pop songs.

Eminem (Rap Artist) Audio Features

  • Next up, it’s Eminem’s rap prowess!
Eminem Audio Features

Eminem Audio Features

  • Eminem’s songs have a high value in general for danceabilty, energy, and also speechiness which as obvious characteristic for a rapper. The valence for Eminem’s songs are negative and thus goes out to show that Eminem usually has negative words in his songs. This doesn’t affect the popularity of his songs, at this is his strongest suit. His audeince love the content of his lyrics even though they are negative. Maybe this is because he is honest about his life and that resonates among his audience.

AC/DC (Rock Artist) Audio Features

  • Next, it’s AC/DC’s rocking songs!
AC/DC Audio Features

AC/DC Audio Features

  • We can evidently observe that AC/DC is best known for the energy they bring in with their songs. Rock songs are usually more energetic and this observation is a proof of it. Even though they lack valence and danceability, they’re a popular artist known for the energy they bring in!

Shakira (Latin Artist) Audio Features

  • Let’s look at Shakira’s Latin songs!
Shakira Audio Features

Shakira Audio Features

  • Shakira’s songs have good mix of danceabilty, energy, and also mostly composed of positive valence value. This is a sweet spot in terms of audio features and there are no surprises that it makes people dance to the tunes without their knowledge! She is one of the most loved Latin and Pop singers of the modern era.

Usher (Rock Artist) Audio Features

  • Let’s look at Usher’s Rythm and Bass!
Usher Audio Features

Usher Audio Features

  • Usher’s songs are generally slow and melodious and this is justified from the low energy observed in the plot. Danceabilty is his strongest feature and his songs tend to be more positive as well. This is the kind of music that’s enjoyed by people who love melody and rhythmic music.

David Guetta (EDM Artist) Audio Features

  • Finally, David’s Electronic Dance Music!
David Guetta Audio Features

David Guetta Audio Features

  • David Gueatta, a DJ is best known for the kind of energy he brings in with his songs. His sings are failry danceable as well as energetic. This makes David’s songs popular even though his valence is poor with mostly negative lyrical words.

Mean Values for the Audio Features are as follows -

Danceabilty -

  • Taylor Swift - 0.6701429
  • Eminem - 0.7346154
  • AC/DC - 0.51084
  • Shakira - 0.7347
  • Usher - 0.7614706
  • David Guetta - 0.61415

Speechiness -

  • Taylor Swift - 0.07605
  • Eminem - 0.2390026
  • AC/DC - 0.094764
  • Shakira - 0.09703
  • Usher - 0.0963118
  • David Guetta - 0.078215

Energy -

  • Taylor Swift - 0.6827143
  • Eminem - 0.7578462
  • AC/DC - 0.82136
  • Shakira - 0.7417
  • Usher - 0.5531176
  • David Guetta - 0.8260667

Valence -

  • Taylor Swift - 0.6375
  • Eminem - 0.4266256
  • AC/DC - 0.53016
  • Shakira - 0.76085
  • Usher - 0.6858235
  • David Guetta - 0.3748767

Conclusion

  • Firstly, there is a positive or negative correlation between audio features and track popularity. However, as we all know, the value of a art work can’t be measured only by numbers. The popularity of music artworks depends more on the artist’s own popularity, creative talent or singing ability, or external factors such as world trends. The probability of success by deliberately catering to audio features and creating specific songs is not sufficient.

  • Secondly, each top artist has its own artistic characteristics, and will be loved by specific groups of people. Top artists do not create music artworks according to the trend, instead, they will create their own trend for the world. We can observe that each popular artist has their own strong audio features and are loved even if their valence values are low. This can be evidenlty observed from the section ‘Unique Features of Artists’. Each of the selected artists for one of the six genres are popular for their own style and features of music.

  • As for the six kinds of music genres that can stand out from the modern music, there are also their own characteristics inside. It’s hard to understand the reasons for their success because of their unique styles. What we can do is to determine the genre of each song according to its style. We can extend the Unique Feature analysis to any artist any know why exactly they are popular.

  • Although Coldplay is one of the representative rock artist, their works contain more negative emotions. This is also in line with the rebellious and critical spirit of rock music, and this spirit has been respected by young people of different races all the time. They stick to their own style, try unconventional music routines as far as possible, and point to people’s hearts with straightforward, profound and moving melody. This also confirms our analysis that Coldplay songs’ lyrics convey negative emotions, which does not affect their popularity, but makes them top artists. This is also in line with the other artists such as Eminem and AC/DC whose valence are low.

  • In conclusion, track popularity will pay more attention to the singer’s own ability and attitude, rather than audio features. The biggest role of audio features is to reflect the singer’s music style, rather than increase popularity.

References